Evaluation Set for Slovak News Information Retrieval

نویسندگان

  • Daniel Hládek
  • Ján Stas
  • Jozef Juhár
چکیده

This work proposes an information retrieval evaluation set for the Slovak language. A set of 80 queries written in the natural language is given together with the set of relevant documents. The document set contains 3980 newspaper articles sorted into 6 categories. Each document in the result set is manually annotated for relevancy with its corresponding query. The evaluation set is mostly compatible with the Cranfield test collection using the same methodology for queries and annotation of relevancy. In addition to that it provides annotation for document title, author, publication date and category that can be used for evaluation of automatic document clustering and categorization.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TUKE-BNews-SK: Slovak Broadcast News Corpus Construction and Evaluation

This article presents an overview of the existing acoustical corpuses suitable for broadcast news automatic transcription task in the Slovak language. The TUKE-BNews-SK database created in our department was built to support the application development for automatic broadcast news processing and spontaneous speech recognition of the Slovak language. The audio corpus is composed of 479 Slovak TV...

متن کامل

Knowledge Organization in a Multilingual System for the Personalization of Digital News Services: How to Integrate Knowledge

In this paper we are concerned with the type of services that send periodic news selections to subscribers of a digital newspaper by means of electronic mail. The aims are to study the influence of categorisation in information retrieval and in digital newspapers, different models to solve problems of bilingualism in digital information services and to analyse the evaluation in information filt...

متن کامل

Data-Driven Relevance Judgments for Ranking Evaluation

Ranking evaluation metrics are a fundamental element of design and improvement efforts in information retrieval. We observe that most popular metrics disregard information portrayed in the scores used to derive rankings, when available. This may pose a numerical scaling problem, causing an underor over-estimation of the evaluation depending on the degree of divergence between the scores of rank...

متن کامل

Systematic Evaluation of Machine Translation Methods for Image and Video Annotation

In this study, we present a systematic evaluation of machine translation methods applied to the image annotation problem. We used the well-studied Corel data set and the broadcast news videos used by TRECVID 2003 as our dataset. We experimented with different models of machine translation with different parameters. The results showed that the simplest model produces the best performance. Based ...

متن کامل

Mapping Words Between Slovak Text and its Translation to English

Word alignment in texts translated to different languages is used in various applications such as cross-language information retrieval. To search for equivalent words in text translations various statistical methods, methods based on position of words in phrases and methods based on bilingual dictionaries are used. However it is very difficult to use these methods in languages with big morpholo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016